Grid Data Mining by eans of Learning Classifier Systems and Distributed Model Induction
نویسندگان
چکیده
This paper introduces a distributed data mining approach suited to grid computing environments based on a supervised learning classifier system. Different methods of merging data mining models generated at different distributed sites are explored. Centralized Data Mining (CDM) is a conventional method of data mining in distributed data. In CDM, data that is stored in distributed locations have to be collected and stored in a central repository before executing the data mining algorithm. CDM method is reliable; however it is expensive (computational, communicational and implementation costs are high). Alternatively, Distributed Data Mining (DDM) approach is economical but it has limitations in combining local models. In DDM, the data mining algorithm has to be executed at each one of the sites to induce a local model. Those induced local models are collected and combined to form a global data mining model. In this work six different tactics are used for constructing the global model in DDM: Generalized Classifier Method (GCM); Specific Classifier Method (SCM); Weighed Classifier Method (WCM); Majority Voting Method (MVM); Model Sampling Method (MSM); and Centralized Training Method (CTM). Preliminary experimental tests were conducted with two synthetic data sets (eleven multiplexer and monks3) and a real world data set (intensive care medicine). The initial results demonstrate that the performance of DDM methods is competitive when compared with the CDM methods.
منابع مشابه
Supervised Learning Classifier Systems for Grid Data Mining
This paper explores parallel and distributed implementation of the Learning Classifier System (LCS) technology. Specifically, the adaptation of supervised LCS to the grid data mining requisites, using the agent paradigm, is studied. The paper also examines the competitive data mining model induction possibility with homogeneous and heterogeneous data. A distributed framework is proposed using t...
متن کاملA Grid Data Mining Architecture for Learning Classifier Systems
Recently, there is a growing interest among the researchers and software developers in exploring Learning Classifier System (LCS) implemented in parallel and distributed grid structure for data mining, due to its practical applications. The paper highlights the some aspects of the LCS and studying the competitive data mining model with homogeneous data. In order to establish more efficient dist...
متن کاملAgent-Based Learning Classifier Systems for Grid Data Mining
Grid Data Mining tools must be able to cope with very large, high dimensional and, frequently heterogeneous data sets that are geographically distributed and stored in different types of repositories, produced from different devices and retrieved through different protocols. This paper presents an agent-based version of a Learning Classifier System. An experimental study was conducted in a comp...
متن کاملA Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis
Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...
متن کاملA Probabilistic Bayesian Classifier Approach for Breast Cancer Diagnosis and Prognosis
Basically, medical diagnosis problems are the most effective component of treatment policies. Recently, significant advances have been formed in medical diagnosis fields using data mining techniques. Data mining or Knowledge Discovery is searching large databases to discover patterns and evaluate the probability of next occurrences. In this paper, Bayesian Classifier is used as a Non-linear dat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011